Skip to content

fix(proxy): add backpressure handling to prevent hang on large responses#16

Merged
nik-localstack merged 3 commits into
masterfrom
fix/proxy-backpressure-large-payloads
May 18, 2026
Merged

fix(proxy): add backpressure handling to prevent hang on large responses#16
nik-localstack merged 3 commits into
masterfrom
fix/proxy-backpressure-large-payloads

Conversation

@nik-localstack
Copy link
Copy Markdown

@nik-localstack nik-localstack commented May 16, 2026

Summary

  • When forwarding large responses, the proxy could deadlock: the receiver's TCP buffer fills up, which fills the proxy's send buffer, which stalls the sender via flow control. With no EVENT_WRITE handling the proxy had no way to retry the stalled send.
  • BlockingIOError and ssl.SSLWantWriteError (both subclasses of OSError) were being swallowed by the generic connection-close handler, tearing down the connection instead of retrying. SSLWantWriteError is raised by non-blocking SSL sockets when the underlying TCP buffer is full — distinct from BlockingIOError on plain sockets.
  • The deadlock is reliably triggered by responses larger than the TCP send buffer (confirmed at >400 KB on macOS localhost; threshold varies by OS and kernel tuning).

Changes

proxy.py

  • Catch BlockingIOError / ssl.SSLWantWriteError explicitly before OSError in both send paths; register EVENT_WRITE on the destination socket so the selector retries the flush when buffer space is available.
  • Add an EVENT_WRITE handler on conn itself to drain any backlogged out_bytes after a previous partial-send stall.
  • Add return guards after connection close in the EVENT_READ path to prevent fall-through into stale redirect_conn state (double-unregister / use-after-close).

tests/test_proxy.py

  • Add test_various_payload_sizes parametrized over 1 B, 1 KB, 100 KB, 1 MB, 10 MB and 10 k / 100 k rows, run over both plain and SSL connections. Without the fix the 1 MB+ cases hang.

Test plan

  • pytest tests/test_proxy.py -k test_various_payload_sizes — all 14 cases pass
  • pytest tests/test_proxy.py — existing tests unchanged

🤖 Generated with Claude Code

Copy link
Copy Markdown

@bentsku bentsku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one, this is actually a bug I've introduced while trying to fix the busy waiting, I didn't realize at the time to call selector.modify to remove the always set EVENT_WRITE from before.

I can see that changing events from the selector is definitely the way, found this online too.

I introduced it with #4, but at the time the events were never changed and were set to both read & write, so the selector would busy loop. This dynamic selection is much better, as it's definitely true that the next_conn (naming is hard) might be stuck for writes.

Nice find! LGTM 👍

Let's maybe update the setup.py for the version and the changelog and let's get it released!

Comment thread postgresql_proxy/proxy.py
Comment on lines +268 to +269
conn.events = selectors.EVENT_READ
self.selector.modify(sock, selectors.EVENT_READ, data=conn)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tiny minor nit: in the last self.selector.modify, we call it with conn.events, I guess because it is a merge of both write and read and it's long. Not sure if we want to keep it for consistency, but not blocking tbh; it doesn't really matter

nik-localstack and others added 3 commits May 18, 2026 17:19
When forwarding large responses, the proxy's send to the destination
socket can block: the receiver's TCP buffer fills up, which fills the
proxy's send buffer, which stalls the sender via flow control. With no
EVENT_WRITE handling the proxy had no way to retry the stalled send,
causing a deadlock for responses larger than the TCP send buffer (~400KB
on macOS, ~256KB on Linux).

Catch BlockingIOError explicitly (before the generic OSError handler)
and register the socket for EVENT_WRITE so the selector retries the
flush when buffer space becomes available. Also add return guards after
connection close in the EVENT_READ path to prevent fall-through into the
now-stale redirect_conn state.

Add test_various_payload_sizes covering 1B, 1KB, 100KB, 1MB, 10MB and
10k/100k rows, over both plain and SSL connections, to catch regressions.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ssl.SSLSocket.send() raises ssl.SSLWantWriteError (not BlockingIOError)
when the underlying TCP buffer is full on a non-blocking SSL socket.
SSLWantWriteError is a subclass of OSError, so it was caught by the
generic connection-close handler, closing the connection mid-response.
The client socket stayed open, leaving the caller hanging indefinitely.

Catch SSLWantWriteError alongside BlockingIOError in both send paths so
SSL connections correctly register EVENT_WRITE and retry when buffer
space becomes available.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@nik-localstack nik-localstack force-pushed the fix/proxy-backpressure-large-payloads branch from 0ea4e3d to 780992b Compare May 18, 2026 14:27
@nik-localstack nik-localstack merged commit fb7c0f4 into master May 18, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants